DSA: Decentralized Double Stochastic Averaging Gradient Algorithm
نویسندگان
چکیده
This paper considers convex optimization problems where nodes of a network have access to summands of a global objective. Each of these local objectives is further assumed to be an average of a finite set of functions. The motivation for this setup is to solve large scale machine learning problems where elements of the training set are distributed to multiple computational elements. The decentralized double stochastic averaging gradient (DSA) algorithm is proposed as a solution alternative that relies on: (i) The use of local stochastic averaging gradients. (ii) Determination of descent steps as differences of consecutive stochastic averaging gradients. Strong convexity of local functions and Lipschitz continuity of local gradients is shown to guarantee linear convergence of the sequence generated by DSA in expectation. Local iterates are further shown to approach the optimal argument for almost all realizations. The expected linear convergence of DSA is in contrast to the sublinear rate characteristic of existing methods for decentralized stochastic optimization. Numerical experiments on a logistic regression problem illustrate reductions in convergence time and number of feature vectors processed until convergence relative to these other alternatives.
منابع مشابه
D$^2$: Decentralized Training over Decentralized Data
While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be unique and different. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are not too differ...
متن کاملAsynchronous Decentralized Parallel Stochastic Gradient Descent
Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...
متن کاملAveraging Asynchronously Using Double Linear Iterations∗
The distributed averaging problem is to devise a protocol which will enable the members of a group of n > 1 agents to asymptotically determine in a decentralized manner, the average of the initial values of their scalar agreement variables. A typical averaging protocol can be modeled by a linear iterative equation whose update matrices are doubly stochastic. Building on the ideas proposed in [1...
متن کاملDistributed Averaging using non-convex updates
Motivated by applications in distributed sensing, a significant amount of effort has been directed towards developing energy efficient algorithms for information exchange on graphs. The problem of distributed averaging has been studied intensively because it appears in several applications such as estimation on ad hoc wireless and sensor networks. A Gossip Algorithm is an averaging algorithm th...
متن کاملStochastic averaging for SDEs with Hopf Drift and polynomial diffusion coefficients
It is known that a stochastic differential equation (SDE) induces two probabilistic objects, namely a difusion process and a stochastic flow. While the diffusion process is determined by the innitesimal mean and variance given by the coefficients of the SDE, this is not the case for the stochastic flow induced by the SDE. In order to characterize the stochastic flow uniquely the innitesimal cov...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016